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ABSTRACT 

One  view  of  numerical  software  is  that  it  is  simply  a  computer  implementation  of  a  known 
method.  Implicit  in  this  view  is  the  assumption  that  the  flow  of  information  is  in  one  direction 
only.  However,  developments  in  methods  and  software  are  intimately  related,  and  neither  is 
complete  if  considered  in  isolation.  In  this  paper,  we  illustrate'  how  the  development  of  numerical 
software'  has  influenced  emr  research  in  optimisation  methods. 
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1.  Introduction 

It  is  almost  a  truism  that  an  optimizat  ion  method  cannot  be  treated  as  practical  unless  an  imple¬ 
mentation  has  boon  produced  and  a  significant  amount  of  computation  performed.  Thus,  research 
on  optimization  methods  necessarily  overlaps  heavily  with  the  development,  of  software.  Since 
the  advent  of  serious  work  on  numerical  software,  much  has  been  written  about  the  complexi¬ 
ties  that  arise  when  transforming  any  mathematical  algorithm  into  an  implementation  (see,  c.g., 
t'ody,  1974;  dill  ef  a /..  1979;  Cowell,  1983).  Although  most  workers  in  optimization  are  aware 
of  such  issues,  the  effect  of  implementation  on  methods  is  much  less  widely  understood.  In  fact, 
the  relationship  between  methods  and  software  is  sometimes  described  simply  by  defining  an 
implementation  as  a  concrete  realization  of  a  theoretical  algorithm. 

In  our  view,  this  statement  does  not  include  the  crucial  influence  that  implementation  may 
have  on  theoretical  algorithms.  Implementation  must  always  lx*  considered  by  the  algorithm 
designer  in  the  sense  that  tin*  steps  of  a  theoretical  method  should  be  implementable.  However, 
our  experience  suggests  that  implementation  has  a  much  more  substantive  effect  on  methods.  This 
paper  accordingly  develops  the  theme  that  software  creates  new  methods,  and  that  a  method  can 
be  produced  in  its  most  effective  form  only  in  conjunction  with  a  careful  implementation.  In  order 
to  avoid  vague  generalities,  our  own  experiences  with  specific  issues  will  be  cited  to  illustrate  the 
sometimes  subtle  interconnections  that  can  occur.  Tims,  we  shall  describe  the  evolution  of  certain 
methods  as  a  result,  of  implementation. 

Initialixation  and  communication  are  two  critical  areas  in  which  the  process  of  implementa¬ 
tion  influences  an  algorithm.  In  describing  the  computation  associated  with  an  algorithm,  the 
basic  iteration  is  usually  the  main  concern.  In  implementing  a  method,  however,  initialization 
is  crucial  not  only  the  computation  that  must  be  performed  to  initiate  the  method,  but  also 
the  information  that  is  communicated-  to  and  from  the*  algorithm.  In  our  experience,  the  na¬ 
ture,  cost  and  difficulty  of  initialization  procedures  may  be  obscured  by  the  purely  mathematical 
description  of  an  algorithm.  Furthermore,  the  representation  of  the  information  needed  by  an 
algorithm  c.g.,  a  particular  matrix  factorization  is  often  left  undefined  until  required  by  an 
implementation.  This  latter  tendency  sometimes  leads  to  a  surprisingly  optimistic  belief  in  “black 
box"  software;  we  shall  describe  several  examples  in  which  the  use  of  "off-the-shelf"  codes  in  a 
complex  algorithm  introduces  sub-’anlial  inefficiencies  because  of  poor  communical ion  between 
software  modules. 

A  careful  implementation  also  illuminates  questions  of  detail  that  are  typically  ignored  in  a 
purely  theoretical  setting  In  tlx*  analysis  of  algorithms,  a  “good"  proof  should  have  the  widest 
possible  application  and  generality.  In  order  to  facilitate  the  construction  of  such  proofs,  methods 
are  often  described  with  a  comparable  level  of  abstraction.  For  example,  theoretical  descriptions 
of  active-set  nonlinear  programming  methods  typically  treat  linear  and  nonlinear  constraints  in 
a  uniform  way.  Consequently.  it  i>  not  always  appreciated  that  substantial  improvements  in 
performance  can  result  when  algorithms  exploit  the  different  properties  of  linear  and  nonlinear 
constraints.  The  scope  lor  increased  elliciency  is  not  restricted  to  savings  such  as  avoiding  the 
rccompiitat  ion  of  gradients  for  linear  constraints.  Linear  constraints  have  tin*  important  property 
that,  from  any  non-opt  imal  feasible  point,  directions  can  be  generated  along  which  there  must 
exist  intervals  of  feasible  points.  This  property  allows  the  development  of  methods  that  always 
retain  feasibility  with  respect  to  the  linear  constraints.  For  such  methods,  the  path  to  the  solution 
and  the  number  of  iterations  lend  to  differ  completely  from  those  of  methods  that  ignore  linearity. 
The  many  non-obvious  effects  of  separate  treatment,  of  linear  and  nonlinear  constraints  will  be  a 
recurrent  topic  in  this  paper. 
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2.  Background  on  quadratic  and  nonlinear  programming 

2.1.  Quadratic  programming.  Many  of  our  examples  will  be  drawn  from  a  relatively  simple 
optimization  problem  quadratic  programming  (QP): 

minimize  cTx+^xTHx 

llrS"  1 

subject  to  t.  <  (  X  )  <  u, 
l  Ax  t 

where  H  is  symmetric.  The  constraints  involving  the  matrix  A  will  be  called  the  general  con¬ 
straints;  the  remaining  constraints  will  be  calk'd  bounds.  An  equality  constraint  corresponds  to 
sotting  ft  =  u,.  Similarly,  a  special  "infinite’'  value  for  (,  or  ut  is  used  to  indicate  tbc  absence  of 
one  of  the  bounds. 

In  general,  an  iterative  process  is  required  to  solve  a  quadratic  program.  (For  simplicity,  we 
shall  always  consider  a  typical  iteration  and  avoid  reference  to  the  index  of  the  iteration.)  Each 
new  iterate  x  is  defined  by 

x  =  x  +  ap,  (1) 

where  the  step  length  a  is  a  non-negative  scalar,  and  p  is  called  the  search  direction. 

Throughout  this  paper,  we  shall  consider  o/ily  a  particular  active-set  feasible-point  method 
for  quadratic  programming  (see  (dill  et.  ah.  1981a).  An  important  feature  of  the  method  is  that, 
once  any  iterate  is  feasible,  all  subsequent  iterates  remain  feasible. 

The  essence  of  the  method  is  the  definition  of  a  working  set  of  constraints  (general  and 
bound)  that  are  satisfied  exactly  at  x.  The  search  direction  is  constructed  so  that  the  constraints 
in  the  working  set  remain  unaltered  for  any  value  of  the  step  length.  For  a  bound  constraint  in 
the  working  set,  this  property  is  achieved  by  setting  the  corresponding  component  of  the  search 
direction  to  zero.  Thus,  the  associated  variable  is  fixed,  and  specification  of  the  working  set 
induces  a  partition  of  x  into  li.xcd  and  free  variables.  During  a  given  iteration,  the  fixed  variables 
are  effectively  removed  from  the  problem:  since  the  relevant  components  of  the  search  direction 
are  zero,  the  columns  of  A  corresponding  to  fixed  variables  may  be  ignored. 

Lot  m  denote  the  number  of  general  constraints  in  the  working  sot  and  let  n,.u  denote  the 
number  of  free  variables.  Lot  (,’  denote  the  m  x  n,,„  submatrix  of  general  constraints  in  the 
working  sot  corresponding  to  the  free  variables,  and  let  />  denote  the  search  direction  with  respect 
to  the  free  variables  only.  The  general  constraints  in  the  working  set  will  ho  unaltered  by  any 
move  .along  p  if 

Cp  =  0.  (2) 

In  order  to  compute  p,  the  TQ  factorization  of  C  is  used: 

CQ  -{  0  T),  (3) 

where  T  is  an  rn  x  m  reverse-triangular  matrix  (i.e..  0  if  i  t  j  <  m),  and  the  non-singular 

matrix  Q  is  the  product  of  either  orthogonal  or  stabilized  elementary  transformations  (see  (Jill  et 
a/..  1981c).  hi  this  paper,  we  consider  only  the  case  where  Q  is  orthogonal.  If  T  is  non-singular 
and  the  columns  of  Q  are  partitioned  so  that 

Q  =  (Z  Y),  (4) 

where  Y  is  nK„  x  m,  then  the  (nMI  m)  columns  of  Z  form  a  basis  for  the  null  space  of  C.  Thus, 
p  will  satisfy  (2)  only  if 


for  sonic  vector  pz. 

The  definition  of  pz  in  (5)  depends  on  whether  the  current  point  is  feasible.  If  not,  pz  is 
taken  as  -  ZTq ,  where  q  is  the  gradient  of  the  current  sum  of  infeasibilities  with  respect  to  the 
free  variables.  Otherwise,  p,  is  the  solution  of 

RjRgPg=-ZTq,  (6) 

where  Ry  is  upper  triangular  and  <j  is  the  gradient  of  the  quadratic  objective  function  with  respect 
to  the  free  variables.  The  Cholesky  factor  R  y_  is  closely  related  to  the  projection  of  the  Hessian 
matrix  (with  respect  to  the  free  variables)  into  the  subspace  defined  by  Z. 

At  each  iteration,  the  working  set  is  changed  by  adding  or  deleting  constraints.  Each  change 
in  the  working  set  leads  to  a  simple  change  to  (':  if  the  status  of  a  general  constraint  changes, 
a  row  of  ('  is  altered;  if  a  hound  constraint  enters  or  leaves  the  working  set,  a  column  of  C 
changes.  The  method  recurs  explicit  representations  of  T,  Q,  Ry  and  qQ  (the  vector  QTq,  where 
q  (:£  c  f  II  x)  is  the  gradient  of  the  quadratic  objective  function). 

2.2.  Nonlinear  programming.  The  second  problem  to  be  discussed  is  the  nonline  or  program¬ 
ming  problem: 

NP 


minimize 

F(x) 

xeSR" 

f  X 

subject  to 

f-  <  j  Ahx 

{  c(z) 

where  F{x)  (the  objective  function)  is  a  smooth  nonlinear  function,  A,,  is  a  constant  matrix  of 
linear  constraints,  and  r(x)  is  a  vector  of  smooth  nonlinear  constraint  functions.  Note  that  simple 
bounds  and  linear  constraints  are  represented  separately  from  nonlinear  constraints. 

It  is  widely  considered  today  that  sequential  quadratic  programming  (SQP)  methods  are 
the  most  effective  general  techniques  for  treating  constraint  lionliuearit ies.  The  idea  of  solving 
nonlinear  programs  by  a  sequence  of  quadratic  programming  subprohleins  was  first  suggested  by 
Wilson  (1003).  SQP  methods  were  popularized  mainly  by  Biggs  (1972),  Han  (197G)  and  Powell 
(1977).  (A  brief  history  of  SQP  methods  and  an  extensive  bibliography  are  given  in  (till,  Murray 
and  Wright.  1981.  For  a  survey  of  recent  results  and  references,  see  Powell,  1983.) 

The  basic  structure  of  an  SQP  met  bod  involves  major  and  minor  iterations.  Associated  with 
each  major  iteration  are  a  search  direction  p.  a  set  of  multipliers  p  for  the  nonlinear  constraints, 
and  a  working  set  of  nonlinear  constraint  s  (a  predict  ion  of  t  lie  const  mints  that  are  sat  islied  exactly 
at  the  solution).  In  the  methods  considered  in  this  paper,  all  of  these  quantities  are  computed 
from  a  quadratic  programming  subproblem.  Tlx1  vcx-t.or  p  of  (1)  is  the  solution  itself,  p  is  the 
Lagrange  multiplier  vector  from  the  QP  siibprobloin,  and  the  "nonlinear"  working  set  is  the  final 
active  set  of  the  snbproblem.  Since  solving  such  a  subproblem  is  itself  an  iterative  procedure,  the 
minor  iterations  of  an  SQP  method  are  those1  of  the  QP  method. 

The  QP  snbproblem  is  defined  by  a  set  of  linear  constraints  and  a  quadratic  objective  function. 
In  most  SQP  methods.  Die  linear  constraints  of  the  suhprohlciu  are  linearizations  of  the  original 
constraints  about  Die  current  point.  In  order  to  attain  rapid  convergence,  the  objective  function 
of  the  snbproblem  must  approximate  the  Lagraiigiau  function.  Each  major  iteration  thus  includes 
a  quadratic  programming  suhprohlciu  of  the  form 

minimize  rjTp  +  lpTHp 

p 


subject  to  (. 


f-  <  I  A,.p  \  <  a, 


where  g  is  tlir  gradient  of  F  at  x,  the  matrix  II  is  an  approximation  to  the  Hessian  of  the 
Lagrangian  function,  and  .4*  is  the  Jacobian  matrix  of  c(x)  evaluated  at  x.  Let.  i  in  NP  be 
partitioned  into  three  sections  -  f,  and  corresponding  to  the  bound,  linear  and 

nonlinear  constraints.  The  vector  (  is  similarly  partitioned,  and  is  defined  as 

f n  -—  In  X*  I-l  ~  I  l  ALx,  and  fN  ~  1^  c, 

where  c  is  the  vector  of  nonlinear  constraints  evaluated  at  x.  The  vector  u  is  defined  in  an 
analogous  fashion. 

Having  obtained  p.  the  major  iteration  proceeds  by  determining  a  sfeplength  a  in  (1)  that 
produce's  a  "sufficient  decrease"  in  some  merit  function  (a  combination  of  the  objective  and 
constraint  functions  that  measures  progress  toward  tilt'  solution). 

because  exact  second  derivatives  may  fit'  unavailable  in  practice,  most  work  on  SQP  methods 
has  concentrated  on  the  case  in  which  the  matrix  H  in  the  subproblem  is  a  positive-definite 
ipiasi-Newton  approximation  to  the  Hessian  of  the  Lagrangian  function.  (For  a  review  of  quasi- 
Newton  methods,  see  Dennis  and  Schnabel,  1983.)  After  x  has  been  obtained,  the  new  Hessian 
approximation  is  typically  defined  as  a  low-rank  modification  of  the  old,  so  that  the  Hessian 
matrices  of  successive  quadratic  programming  subproblems  are  related  in  a  special  way  (sec 
Section  G).  With  the  DFCS  update,  for  example,  the  new  matrix  II  is  given  by 

<7> 

where  s  —  x-x  (the  change  in  x).  and  ye  is  the  difference  in  gradients  of  the  Lagrangian  function, 
i.e., 

ye  ~  ye  -  ye  -  (y  -  ATN  X)  -  (g  -  ATN  A), 

when'  A  and  X  are  Lagrange  multiplier  estimates  at  the  new  and  old  points.  In  most  SQP  methods, 
both  X  and  A  are  set  to  /t,  the  multipliers  of  t In'  latest  subproblem. 

It  should  be  clear  from  the  above  description  that  an  SQP  method  is  inherently  complex.  Any 
implementation  of  an  SQP  method  must  include:  (1)  the  solution  of  a  quadratic  programming 
subproblem  (with  some  provision  for  the  treatment  of  inconsistent  constraints);  (2)  definition  of 
a  merit  function:  and  (3)  specification  of  the  Hessian. 


3.  Finding  an  initial  feasible  point 

As  a  brief  example  of  how  an  algorithm  description  can  be  misleading  with  respivt  to  the  efficiency 
of  the  implementation,  consider  an  SQP  method  in  which  a  so-called  two-phase  (primal)  quadratic 
programming  method  is  used  to  solve  the  subproblem.  The  two  phase's  are:  finding  an  initial 
feasible  point  by  minimi/, ing  the  sum  of  iufeasibilil u-s  (tin1  feasibility  phase),  and  minimizing 
the  quadratic  objective  function  in  the  feasible  region  (I  he-  QP  phase).  When  describing  such 
a  method,  it  is  convenient  to  state  simply  that  an  initial  feasible  point,  must  bo  found  by  a 
phase- i  procedure  ((lie  present  authors  have  done  so  many  limes!).  Unfortunately,  this  form 
of  description  may  be  (wrongly)  interpreted  as  implying  that  a  “black  box”  phase- 1  simplex 
linear  programming  procedure  should  be  invoked  before  the  QP  phase.  If  so,  serious  inefficiency 
could  result  if  several  simplex  iterations  were  required  in  order  to  find  a  feasible  point  (since  a 
vertex  solution  would  need  to  be  created);  standard  linear  programming  software  is  also  highly 
unlikely  to  produce  the  factorizations  needed  to  initiate  the  QP  phase.  Consequently,  a  two-phase 
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qu;  idr.it  ir  programming  mot  lux]  might  bo  considered  as  inolFoctivo  simply  booauso  of  a  lack  of 
detail  in  specifying  the  initialization  procedure. 

On  the  contrary,  the  proper  implementation  of  a  two-phase  quadratic  programming  method 
should  reflect  the  essential  sameness  of  the  linear  algebraic  computations  associated  with  iterations 
in  both  the  feasibility  and  QP  phases  I.»  particular,  each  iteration  involves  an  update  of  the  TQ 
factorization  of  the  working  set.  It  is  typical  for  the  subproblems  in  later  major  iterations  of  an 
SQP  method  to  reach  optimality  in  a  single  iteration  because  the  optimal  active  set  is  available 
before  solving  the  subproblem.  In  this  case,  the  phase-1  procedure  merely  performs  a  feasibility 
(  beck  that  would  be  required  in  any  case. 

In  our  implementation,  the  computations  in  both  phases  are  performed  by  exactly  the  same 
modules.  The  two-phase  nature  of  the  algorithm  is  reflected  by  changing  the  function  being 
minimized  from  the  sum  of  infeasibilities  to  the  quadratic  objective  function.  The  feasibility  phase 
does  not  perform  the  standard  simplex  method  (i.e.,  it  does  not  find  a  vertex),  and  requires  no 
additional  work  in  the  later  iterations  of  an  SQP  method. 


4.  Factorization  of  the  working  set 

The  quadratic  programming  method  of  Section  2.1  requires  an  explicit  matrix  Q  in  order  to 
perform  updates  to  the  working  set.  In  this  section,  we  consider  the  evolution  of  a  quadratic 
programming  method  with  respect  to  its  computation  of  the  initial  Q. 

As  mentioned  previously,  the  active  set  of  each  subproblem  in  an  SQP  method  eventually 
becomes  the  active  set  of  the  nonlinear  problem.  Since  the  iterations  in  a  QP  method  are  directed 
iitirely  toward  finding  the  active  set,  it  would  appear  that  a  good  initial  estimate  of  the  active 
-e*  could  permit  later  subproblems  to  reach  optimality  in  only  one  iteration. 

Our  first  idea  along  these  lines  was  to  include  a  “warm  start”  option  in  the  QP  method, 
i  i'  .  to  specify  the  desired  initial  working  set  as  input  to  the  feasibility  phase  (for  details,  see 
i  Jill  rt  nl..  1985a).  Beyond  adding  this  option,  however,  the  original  implementation  retained  in 
buge  part  the  philosophy  that  the  eventual  SQP  method  should  utilize  as  much  as  possible  an 
'  olM  lie-shelf"  version  of  the  quadratic  programming  method  discussed  in  Section  2.1. 

('(imputation  of  the  TQ  factorization  associated  with  a  spirilied  working  set  may  be  viewed 
.i-  updating  an  existing  factorization  as  new  rows  are  added  in  the  last  position.  Assume  that 
tin  TQ  factorization  (.'»)  of  ('  is  available,  and  consider  the  matrix  (' ,  which  is  (’  augmented  by 
1  lie  row  r  .  Then 

■(£)«=*=(.*  £)•  w 

where  s  and  t  are  the  relevant  partitions  of  QTc.  Let  Q  denote  a  Householder  matrix  of  the  form 


Q 


'..here  the  vector  n  and  scalar  /I  are  chosen  to  annihilate  all  but  the  last  element  of  s,  and  to  leave 
l  unchanged.  (For  details  of  how  these  quantities  are  defined,  see  Stewart,  I97e  )  Then 


(Q  (0  T  )  , 
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whore  Q  -  QQ. 

In  a  general-purpose  QP  algorithm,  the  initial  TQ  factorization  of  C  would  probably  be 
computed  by  the  so-called  "standard”  procedure,  which  can  be  interpreted  as  a  version  of  (8)  and 
(0)  in  which  the  rows  of  ('  are  added  to  the  null  matrix  one  by  one.  The  initial  Q  matrix  is  taken  as 
the  identity,  and  the  initial  T  is  the  null  matrix.  While  computing  the  factorization,  the  sequence 
of  Householder  transformations  is  stored  in  compact  form  (i.e.,  Q  is  not  stored  explicitly);  the 
vector  eTQ  needed  in  (8)  is  obtained  by  applying  the  sequence  of  stored  transformations.  Once  the 
initial  factorization  has  berm  completed,  the  necessary  explicit  matrix  Q  is  obtained  by  multiplying 
the  compact  Householder  transformations  together  in  reverse  order. 

A  shift  in  perspective  occurs  when  this  computation  is  performed  within  an  SQP  method 
applied  to  a  problem  that  contains  both  linear  and  nonlinc; ir  constraints.  (We  shall  use  the  term 
linear  row  to  demote  a  row  of  the*  working  set  associated  with  a  linear  constraint  in  the  original 
problem,  and  similarly  for  nonlinear  row.)  Since  the  rows  of  A,  are  the  same  in  each  subproblem, 
it  seems  highly  desirable  to  avoid  refactorizat ion  of  linear  rows.  We  emphasize  that  by  considering 
this  possibility,  we  have  already  deviated  from  seeking  a  general-purpose  quadratic  programming 
code,  which  is  unlikely  to  make  distinctions  between  row  types.  Ilence.  to  take  advantage  of  the 
presence  of  linear  constraints,  a  more  specialized  QP  method  must  be  devised. 

Even  if  the-  QP  method  can  distinguish  among  rows,  it  is  still  not  straightforward  to  exploit 
constraint  linearities  in  computing  the  initial  TQ  factorization.  In  order  for  the  same  Householder 
transformations  to  be  used  again,  t lit'  order  of  the  rows  must  remain  unchanged.  However,  the 
order  of  the  rows  in  the  final  ('  is  determined  by  the  order  in  which  constraints  enter  or  leave  the 
working  set  during  the  iterations  of  the  quadratic  programming  method.  If  the  constraints  of  the 
nonlinear  problem  are  ordered  so  that  the  linear  constraints  precede  the  nonlinear  constraints,  and 
if  constraints  are  always  added  to  the  initial  working  set  in  order,  we  can  determine  in  the  major 
iteration  how  many  rows  of  C  have  not  changed  since  the  previous  subproblem.  Ideally,  the  part 
of  thc>  TQ  factorization  corresponding  to  the  unchanged  linear  rows  need  not  be  recomputed.  In 
particular,  if  the  first  rn,  rows  of  ('  are  linear,  the  implement  at  ion  should  lit'  able  to  utilize-  the  first 
m,  Householder  transformations  from  the  initial  TQ  factorization  of  the  previous  subproblcm. 
Eventually,  as  the  SQP  iterates  converge.  the'  active-  set  will  not  change  belwe-en  subproblems, 
and  .ill  the  Householder  transformations  corre-sponding  to  linear  rows  can  be-  save-el. 

I hiforl imate-ly.  this  aim  is  extremely  difficult  to  achie-ve-  within  an  implementation  that,  uses 
the-  standard  proceehiri-  to  obtain  the  initial  TQ  faetorizat  ion.  'I'lie-  explicit  Q  from  the-  pre-vious 
subproblcm  does  not  allow  reconstruction  of  tin-  Householde-r  transformations  from  which  it  was 
computed.  Then-fore-,  in  order  to  avoid  refactorizing  the-  line-ar  rows,  the  QP  method  would  have 
to  save  the  compact  transformations  corresponding  to  the  linear  constraints  at  the-  lie-ginning  e>f 
the  initial  working  set .  Even  then,  the-  save-el  compart  transformations  would  be  of  no  use  if  just 
one  bound  had  changed  status  during  the-  QP  iterations.  ('Pin'  column  dimension  of  (’  changes 
when  the  set  of  fre-e-  variables  is  altcre'd.) 

Although  this  analysis  is  discouraging,  we  persisted  in  trying  to  exploit  constraint  linearities 
in  I  lie  mil  ial  TQ  fact  oriy.al  ion  because  of  I  lie  large  number  of  pract  iral  problems  with  a  significant 
number  of  linear  constraints  and  bounds.  'I’lie  result  was  to  change  the  definition  of  both  the  QP 
and  the  SQP  methods.  Rather  than  attempt  to  define  some  way  for  the  quadratic  programming 
method  to  distinguish  among  row  types,  computation  of  the  initial  TQ  factorization  was  removed 
from  the  QP  method  proper  to  the  major  iteration,  which  "naturally"  has  access  to  information 
about  which  constraints  are  linear  and  nonlinear.  A  necessary  consequence  was  that  the  quadratic 
programming  method  must  include  a  hot  star/  option,  in  which  the  input  parameters  of  the 
quadratic  program  include  not  only  the  specification  of  a  working  set.  but  also  its  associated  TQ 


factorization. 

In  addition  to  t his  change  in  structure  of  the  quadratic  programming  method,  t lie  procedure 
for  computing  the  initial  factorization  was  changed.  In  the  updating’  method,  the  explicit  matrix 
(J  from  the  previous  suhproblem  is  taken  as  the  initial  matrix  Q  in  (8).  In  contrast  to  the 
standard  procedure,  where  the  initial  Q  is  taken  as  the  identity  matrix,  this  means  that  each 
new  row  must  he  transformed  by  a  full  orthogonal  matrix  rather  than  a  sequence  of  Householder 
transformations,  and  that  each  Householder  transformation  must  be  multiplied  into  Q  after  the 
corresponding  row  has  been  transformed.  The  benefit  from  this  procedure  with  respect  to  linear 
rows  of  ('  is  that  the  final  matrix  Q  from  the  previous  subproblem  automatically  transforms  the 
initial  constant  rows  of  ( '  into  reverse-triangular  form  (regardless  of  any  changes  that  occurred 
in  the  status  of  bound  constraints).  Thus,  no  work  is  required  to  obtain  the  first  m,.  rows  of  T. 

This  example  of  the  shift  in  a  method  because  of  implementation  is  highly  relevant  because 
of  the  complex  issues  that  led  to  our  choice  of  the  updating  method.  It  should  be  emphasized 
that  the  updating  method  can  be  less  expensive  than  the  combination  of  the  standard  method 
with  the  special  formation  of  Q,  depending  on  the  number  of  linear  rows.  (For  detailed  operation 
counts,  see  Gill  et  ah,  19856.) 

5.  Tlic  initial  projected  Hessian  for  indefinite  quadratic  programs 

Our  quadratic  programming  algorithm  does  not  require  //  to  be  positive  definite,  and  hence  must 
allow  f<  possible  indofinitenoss  and  singularity  of  the  projected  Hessian  in  the  QP  phase.  (The 
quadratic  programming  subproblems  in  an  SQP  method  may  be  indefinite  when  exact  second 
derivatives  are  used  to  define  the  Hessian.)  The  treatment  of  indofinitenoss  depends  critically 
on  tin'  result  that,  if  the  initial  projected  Hessian  is  positive  definite,  the  projected  Hessian  can 
thereafter  become  indefinite  only  when  a  constraint  is  deleted  from  the  working  set  (see  Gill 
and  Murray,  1978).  This  property  implies  that  indefinitencss  can  bo  controlled  explicitly,  and 
permits  a  numerically  stable  Cholesky  factorization  to  be  used  in  solving  (G)  for  px,  even  when 
the  projected  Hessian  is  indefinite.  (For  details,  see  Gill  and  Murray,  1978.  and  Gill  et  ah,  1985a.) 

In  an  implementation,  the  initialization  procedure  must  therefore  determine  a  positive-definite 
projected  Hessian.  How  can  this  lx*  achieved  most  effectively?  Wo  consider  several  strategies  for 
initializing  the  projected  Hessian,  all  of  which  are  based  on  the  observation  that  the  projected 
Hessian  mat  rix  will  be  positive  definite  if  enough  const raints  are  included  in  (he  mil  ial  working  set. 
(The  null  matrix  is  positive  definite  by  definition,  corresponding  to  the  case  when  ('  contains  nKH 
constraints.)  r,'his  suggests  somehow  adding  constraints  to  the  working  set  to  make  the  projected 
Hessian  positive  definite. 

The  first  strategy  deals  with  indefiniteness  through  the  initialization  of  the  feasibility  phase. 
The  initial  working  set  in  the  phase- 1  procedure  is  constructed  to  contain  rt,,„  constraints  (i.e., 
to  define  a  vertex)  by  augmenting  the  "natural"  working  set  (the  constraints  exactly  or  nearly 
satisfied  at  the  starling  point)  with  a  suitable  number  of  "temporary”  bounds,  each  of  which  has 
the  effect  of  temporarily  fixing  a  variable  at  its  current  value.  Since  tin*  phase- 1  procedure  is 
equivalent  to  the  simplex  method  if  started  at  a  vertex,  the  final  working  set  of  the  feasibility 
phase  will  also  be  a  vertex.  Hence,  the  initial  Z  of  the  t^F  phase  will  be  null,  and  the  projected 
Hessian  will  be  (trivially)  positive  definite. 

A  second  option  is  to  carry  out  a  phase- 1  procedure  that  does  not  require  creation  of  a  vertex, 
and  to  form  the  projected  Hessian  as  soon  as  a  feasible  point  is  found,  using  the  final  Z  of  the. 
feasibility  phase.  If  indefinitencss  is  detected  while  performing  the  Cholesky  factorization  of  the 
projected  Hessian,  the  current  point  can  be  made*  into  a  “vertex”  by  adding  temporary  bounds 
as  described  above,  along  with  updating  the  TQ  factorization. 
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With  cither  of  these  techniques,  the  working  set  for  the  initial  quadratic  programming  itera¬ 
tion  is  given  hy  the  square  matrix 


where  E  includes  rows  of  the  identity  matrix  corresponding  to  the'  temporary  bounds.  In  subse¬ 
quent  iterations,  a  temporary  bound  is  treated  as  a  standard  constraint  until  it  is  deleted  from 
the  working  set.  in  which  case  it  will  never  be  added  again. 

Moralise  there  seemed  to  be  no  absolute  theoretical  grounds  for  choosing  between  the  above 
approaches,  our  initial  implementation  of  the  quadratic  programming  algorithm  used  the  first 
strategy,  for  two  reasons.  When  computing  the  TQ  factorization  from  scratch,  adding  bounds  is 
free  .  and  the  size  of  the  first  TQ  factorization  that  must  be  computed  decreases  as  more  bounds 
are  included  in  the  initial  working  set  (for  detailed  analysis  of  the  operations  required,  see  Gill  et 
;il..  1981c):  computing  the  initial  TQ  factorization  of  phase  1  is  cheaper  with  the  lirst  strategy. 
Furthermore,  the  second  approach  has  the  disadvantage  that  all  the  work  of  the  initial  Cholesky 
factorization  in  the  QP  phase  might  be  wasted,  depending  on  the  (unknown)  probability  tln*^  *•'*« 
projected  Hessian  is  positive  definite  at  the  first  feasible  point. 

The  resulting  implementation  of  the  initialization  performed  exactly  as  intended.  a  ver,  a 
close  analysis  of  the  computation  associated  with  each  quadratic  programming  suhproblem  within 
an  SQl'  method  revealed  that  the  creation  of  a  temporary  vortex  at  the  beginning  of  phase  *  4 

less  than  satisfactory.  To  see  why.  assume  that  the  initial  projected  Hessian  is  positive*  definite. 
In  this  case',  each  of  the  temporary  bounds  must  he'  elele-te-d  in  turn,  followed  by  minimization 
of  the'  quadratic  lune'tion  within  a  subspace'  of  inere-ased  dimension;  the'  e-lfe'ct  of  these  moves 
is  implicitly  to  compute  the'  ('Imh'sky  factor  of  the*  projected  Ih-ssian  by  rows.  In  the'  I'xtreme 
case'  of  a  eju.nlr.it  ie-  program  with  an  imconst  mined  sedution.  the-  work  re-ejuiml  to  optimize  the 
quadratic  function  on  n  te'uqmrnry  manifolds  is  more'  than  twice-  the'  work  re-epiired  to  compute 
the'  ti  x  n  Chole  sky  factor  directly. 

Although  one  might  argue  that  a  siiire-ssful  algorithm  slmuld  not  be-  distort  c*d  te>  cater  for 
this  special  e  ase',  flirt he-r  e  xamination  revealed  that  inellie  ie-ney  weitilel  be  the-  rule1  rather  than 
the1  exempt ie>n.  As  an  SQl’  me-lhoel  proceeds.  the'  active-  se-ts  of  QP  subproble-ms  in  late-r  major 
iterations  become'  the-  same  the-  set  of  constraints  ae  five-  at  the’  solution  of  the-  eirigina)  uou  linear 
problem  (se'e  Robinson.  197  t.  fe >r  a  proof).  Therefore',  when  seilving  the-  se'qne'tu'e'  e>f  ejuaelratic 
programs  t  hat  arise  wit  bin  an  SQl’  met  hoel,  tlie-re'  is  a  high  probability  I  hat  the'  active  se-t  from  erne 
so bprol lie'll  1  is  also  t  lie'  correct  active-  sed  for  t  lie-  ne-xt .  In  t  his  context .  t  he-  init  lal  pre »je'e  t e-el  lle-ssiail 
is  like'ly  to  be  positive-  deTmife'.  ami  creviting  the'  temporary  ve-rte'x  is  untnee'ssary.  Wliem  the  initial 
working  se  t  happens  t e >  be  optimal,  the-  process  of  creating  ami  them  e  liminating  the-  te-nqmrary 
bounds  requiri's  approximate*]}*  twice*  the-  we>rk  that  wotilel  be-  re'epiireel  to  compute'  tile-  ('hole'sky 
fae  I  e  >r  directly.  Furl  her  more.  e-omput  ing  a  smalle-r  in  it  ial  TQ  fae  ton /.at  mn  with  te-mporary  bounds 
elite's  Heel  leael  lei  all  over, ill  saving  e  1 1  work.  si  nee  ■  the-  factorization  must  eventually  be  upelate-el  to 
I'elh  e  I  I  lie'  deletion  ol  I  he  unnecessary  bounds. 

Give-11  tin-  observation,  il  might  a|>|icar  that  the1  see  diii!  strategy  mint  ioneel  above*  would 
be  reviveei  However,  the'  following,  even  better,  strategy  was  then  de'Vele >|>eel .  Rather  than 
aelilmg  bom  m  1  e-onsl  rainls  to  the-  working  set  tei  create  a  temporary  verte-x.  the  new  strategy 
aelds  a.-  many  genera/  con-1  rainls  as  ne-i  es-ary  to  i  nsure  a  |io-il  1  ve-  eii'tinil e  projected  He  ssian,  but 
requires  no  fnrtln  r  work  to  update  the  TQ  t.u  t on/at um  or  to  1  .impute  the  projected  Hessian.  The 
e  e  nn  put  at  ion  jiroe  e-eels  as  follows.  At  t  lie  beginning  of  tin  QP  phase*.  t  lie  working  set  ('  ami  its  TQ 
fact ori/at ion  (.'!)  an1  available'.  The  matrix  Z7///  is  formed,  and  the-  Choli'sky  procedure*  with 
symmetric  mtcreh;iii"c>  is  initiated.  Recall  that  the  <  'hole-sky  procedure'  without  interchange's 
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will  break  down  if  the  matrix  is  not  positive-  definite.  However,  by  performing  interchanges  (such 
that  the  coluiiin  with  largest  positive  diagonal  element  is  processed  next  at  each  stej)J,  wc  can 
identify  the  largest  possible  positi ve-delinit e  principal  minor. 

In  algebraic  terms,  assume  that  a  permutation  matrix  P  has  been  chosen  so  that  the  upper 
left  submatrix  of  PTZ  1  // Z P  is  positive  delinite.  In  effect,  the  columns  of  ZP  are  partitioned  as 
ZP  :  (  Z\  Z-i  ).  sui-h  that  Z[lIZl  is  positive  definite,  i.c., 

zjnzl  =  rIrz. 

A  working  set  for  which  Z \  defines  the  null  space  can  be  obtained  by  including  the  rows  of  Z]  as 
temporary  general  constraints.  After  /’  is  determined  (by  the  (’lioleskv  procedure),  the  columns 
of  Z  are  reordered  (i.c..  Z  is  replaced  by  ZP):  note  that  the  properties  of  Z  as  a  basis  for  the  mil! 
space  of  ( '  are  unaffected  by  its  column  ordering.  The  minimization  of  the  quadrat!*'  function 
then  proceeds  within  the  subspace  defined  by  Z\. 

We  discuss  here  only  the  case  when  Q  is  orthogonal.  (For  details  about  the  case  when  Q  is  a 
product  of  stabilized  elementary  transformations,  see  (Jill  of  af.,  1985a.)  In  contrast  to  (10),  the 
temporarily  augmented  working  set  is  given  by 


so  that  p  will  satisfy  ('ji  -  0  and  Zjp  --  0.  By  definition  of  the  TQ  factorization,  C  automatically 
satisfies  the  following: 

V)M°  f) ' 

w  here 


and  hence  the  TQ  factorization  of  (II)  is  free. 

The  implementation  of  this  procedure  involves  several  subtle  points.  The  matrix  Z2  need 
not  be  kept  fixed  at  it-  initial  value,  since  I  he  role  of  the  extra  constraints  is  purely  to  define  an 
appropriate  null  space,  the  TQ  factorization  can  therefore  be  updated  in  the  normal  fashion  as 
ilu  iterations  of  the  quadratic  programming  method  proceed.  No  work  is  required  to  "delete"  the 
temporary  constraints  associated  with  /•_>  when  Z  [  */  0,  since  this  simply  involves  repart  it  ioning 

Q  When  deciding  whu  li  constraint  to  delete,  the  multiplier  vector  associated  with  the  rows  of 
Z,  is  given  by  Z!<i.  ami  the  multipliers  corresponding  to  the  rows  of  the  '  true'  working  set  (7 
are  the  lea'i -squares  multipliers  that  would  be  obtained  if  the  temporary  constraints  were  not 
present  | see  (all  ct  :tl.  1985a). 

Although  uni  might  <  hum  that  I  lie  final  method  could  have  been  developed  independently 
of  any  -oil  ware,  we  in  I  n  ■  ve  that  it  is  mill  l.ely  that  all  the  ramilic.it  ions  of  each  (hone  could  have 

be* 1 11  fully  muleisl . 1  without  careful  implementation  and  detailed  a  posteriori  analysis  of  the 

eomputat  ion. 

G.  Representation  of  the  Hessian 

Section  1  has  described  the  (linages  in  representation  of  a  factorization  of  the  working  set  in  our 
quadratic  programming  method  because  of  its  intended  use  within  an  S(.)P  method  Wc  now 
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turn  to  an  even  more  specialized  question:  the  representation  of  the  Hessian  matrix  in  a  quasi- 
Newton  SQP  method,  i.e..  one  in  which  II  is  a  posit ive-detiuite  approximation  to  the  Hessian  of 
the  Lagrangian  function  that  is  modified  by  a  low-rank  change  between  subproblerns. 

In  all  the  quadratic  programming  methods  discussed  thus  far.  the  Cholesky  factorization 
of  the  initial  projected  Hessian  is  computed  from  scratch  at  the  beginning  of  the  QP  phase. 
Although  this  procedure  is  perfectly  satisfactory  for  a  single  quadratic  program,  it  has  certain 
disadvantages  in  a  quasi-Newton  SQP  method.  In  particular,  when  applied  to  a  problem  with  no 
constraints  or  only  linear  constraints,  such  a  method  is  much  less  efficient  than  standard  quasi- 
Newton  methods  for  these  problems,  in  which  the  Hessian  approximation  is  updated  between 
iterations.  The  question  therefore  arises:  can  an  SQP  method  maintain  an  efficient  treatment  of 
nonlinear  constraints,  yet  remain  competitive  with  unconstrained  or  linearly  constrained  quasi- 
Newton  methods  if  constraint  nonlinearities  are  not  present? 

Such  efficiency  can  be  guaranteed  if  an  algorithm  satisfies  a  specific  criterion:  when  the 
working  set  includes  rns  nonlinear  rows,  the  initialization  of  T,  Q  and  11/  should  require  O + 
)  operations.  This  criterion  is  not  satisfied  by  the  hot-start  option  described  in  Section 
4.  since  changes  in  Z  mean  that  both  ZTHZ  and  It/  must  be  computed  from  scratch.  Our 
approach  involves  recurring  Rq,  the  Cholesky  factor  of  the  transformed  Hessian  approximation 
Q7 IIQ  ( 1;  Hw)  in  both  the  major  and  minor  iterations.  (The  form  (4)  of  Q  implies  that  the 
matrix  11/  needed  to  compute  the  search  direction  is  simply  the  upper  loft  corner  of  Rq-) 

To  illustrate  how  the  method  works,  consider  the  case  when  the  working  set  at  a  given 
iteration  contains  m,  linear  rows  and  a  single  nonlinear  row.  Assume  that,  on  completion  of  the 
QP  subproblem  at  the  point  x.  is  available.  As  indicated  by  (8)  (9).  the  effort  of  replacing 
the  last  row  of  the  working  set  is  to  post-mult iply  Q  (and  therefore  RQ)  by  a  matrix  of  rank  one. 
Since  Q  =  QQ,  whore  Q  —  /  (1  //f)mtr,  we  have 

RqQ  =  Rq  =  Rq  -  ~Rquut.  (12) 

r' 

The  new  Cholesky  factor  Rq  is  then  found  by  constructing  an  orthogonal  matrix  P  that  restores 
upper-triangular  form  to  Rq,  i.e., 

Rq  ~  PRq. 

A  suitable  matrix  P  can  be  constructed  from  two  sweeps  of  plane  rotations;  for  more  details, 
see  (Jill  et  a I.  (1974).  In  general,  if  the  working  sot  contains  m,v  nonlinear  rows,  rnN  rank-one 
updates  must  be  applied  to  obtain  RQ. 

(liven  this  procedure  for  updating  Rq,  we  now  show  how  to  recur  the  transformed  gradient 
<lw,  which  changes  in  two  different  situations.  First,  as  constraints  enlor  or  leave  the  working 
set.  the  plane  rotations  used  to  update  Q  can  simply  be  applies]  to  qq.  The4  second  change  in  qQ 
occurs  when  p  is  replaced  by  p  p  f  oAp.  where  the  search  direction  h>  is  defined  as 


Lot  qq  denote  the  transformed  gradient  at  p.  It  follows  from  (13)  and  the  definitions  of  q  and  RQ 
that 


qq  --  QTc  4  QTII{}>  +  ntip) 
QT(e.  \  Up)  +  nQTH ftp 
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.sJLAmj 


=  q<3  +  aQ  HQ 


u 


=  <7«  + 


(t). 


which  shows  that  II  is  not  required  to  update  qQ. 

In  order  to  avoid  access  to  H  in  the  quasi-Newton  update  between  subproblems,  RQ  can 
be  updated  directly.  In  this  situation.  Q  remains  unchanged,  and  H  undergoes  a  rank-two 
modification.  The  BFGS  update  (7)  of  II  leads  to  the  following  change  in  HQ: 


II Q  Hq  —  HqsqsqHq  +  _ 


(14) 


'<3  “9 


where  yQ  =  QT(ge  —  9t)<  =  QT(x  —  x)  and  <ji  denotes  the  gradient  of  the  Lagrangian  function. 

This  update  may  be  expressed  as  a  rank-one  update  to  RQ  (see  Dennis  and  Schnabel,  1981).  Let 
a  and  7  denote  the  scalars  {8qHQ8Q)$  and  (Vq*q ) '  respectively.  The  updated  matrix  (14)  may 
then  be  written  as  II Q  —  RqRQ,  where 


Rq  =  Rq  +  VVUT,  with  V  =  —  RqSq ,  w  =  -yQ - HqSq. 


Again,  the  matrix  Rq  may  be  restored  to  upper-triangular  form  by  two  sweeps  of  plane  rotations. 

It.  should  be  emphasized  that  these  changes  in  the  representation  of  the  projected  Hessian 
imply  that  the  matrix  Rq  must  be  updated  during  the  feasibility  phase  as  well  as  the  QP  phase. 
The  SQP  method  is  also  altered  in  a  fundamental  way.  Now.  the  factor  Rq  is  altered  by  three 
sources:  changes  in  the  constraint  gradients  (sec  (12));  changes  in  the  curvature  of  the  Lagrangian 
function  (the  quasi-Newton  update);  and,  finally,  changes  in  the  prediction  of  the  active  set. 

Several  interesting  research  issues  have  resulted  from  these  changes  in  the  SQP  method.  If 
the  correct  r/-superlinear  convergence  rate  is  to  be  achieved  at  the  solution,  it  is  necessary  to 
show  that  small  changes  in  the  variables  lead  to  small  changes  in  the  matrix  Rq  (for  details  of 
the  proof,  see  Gill  e<  a/.,  1985/j).  Similarly,  certain  choices  of  least-squares  multiplier  estimates 
lead  to  methods  similar  to  projected  quasi-Newton  methods  (see,  e.g.,  Murray  and  Wright,  1978; 
Golcmnn  and  Conn,  1982;  Gabay.  1982;  and  Nocedal  and  Overton.  1983;  Byrd  and  Schnabel, 
1984).  In  the  next  section,  we  shall  discuss  a  class  of  new  methods  with  a  more  specialized 
treatment  of  linear  constraints. 


7.  A  more  specialized  treatment  of  linear  constraints 

The  ability  to  update  an  approximation  to  the  Hessian  of  the  Lagrangian  function  as  each  new 
nonlinear  row  is  factorized  leads  to  a  class  of  methods  with  separate  active-set  strategics  for  the 
linear  awl  nonlinear  ronsl mints.  Instead  of  using  the  quadratic  programming  subproblem  to 
define  the  complete  working  set.  the  ‘‘active”  linear  constraints  and  bounds  are  determined  in  the 
major  iteration  by  ail  active-set  strategy  typical  in  methods  for  linearly  constrained  optimization 
(see  Gill  and  Murray,  1974;  Gill,  Murray  and  Wright,  1981).  With  this  approach,  the  linear 
rows  never  need  to  be  rofnrtorized  even  during  early  major  iterations,  before  the  active  linear 
const  raints  have  been  determined. 


In  sudi  a  method,  an  initial  point  is  found  that  is  feasible  with  res  pert  to  thi'  linear  constraints 
and  hounds  (see  Section  3),  Within  each  major  iteration,  a  working  set  of  bounds  and  linear 
constraints  is  defined  in  the  usual  way.  Let  (',  denote  the  submatrix  of  general  linear  constraints 
in  the  working  set  corresponding  to  the  free  variables,  with  Z,  the  corresponding  basis  for  the 
null  space.  The  search  direction  is  then  determined  from  the  modified  quadratic  programming 
subproblem 


minimize  ijTp+^pTIIp 

p 

subject  to  £  <  Asp  <  u , 

C,p  =  0, 

where  /t  s.  denotes  the  columns  of  the  Jacobian  of  nonlinear  constraints  corresponding  to  the  free 
variables. 

During  the  major  iterations,  the  TQ  factorization  of  (',  is  recurred  as  linear  constraints  enter 
and  leave  the  working  set.  These  changes  are  reflected  in  the  Cholesky  factor  I fL  of  Z[H Z 
and  are  identical  to  the  updates  that  occur  in  an  active-set  method  for  linearly  constrained 
minimization  i.o..  It,  grows  by  a  row  and  column  when  a  constraint  leaves  the  working  set, 
If ,  shrink-  by  a  row  and  column  whim  a  constraint  enters  the  working  set.  Before  solving  the 
QP  subproblem,  each  nonlinear  row  is  added  to  the  overall  working  set.  leading  to  updates  to 
the  T(J  factorization  and  to  If,  ,  as  in  (12).  (The  matrix  If/  required  to  compute  the  quadratic 
programming  search  direction  is  in  the  upper  left-hand  corner  of  If,.)  On  completion  of  the 
subprobleni.  If,  is  updated  directly  to  incorporate  new  curvature  of  the  Lagrangian  function. 

If  the  working  set  contains  m>  nonlinear  constraints,  the  method  becomes  a  “projected  quasi- 
Newton  met  bod  (see  dill.  Murray  and  Wright.  1(J81).  In  general,  the  sequence  of  iterates  will 
dill’er  from  that  generated  by  the  SQP  method  bi-cause  only  one  linear  constraint  enters  or  leaves 
the  working  set  during  each  major  iteration.  Tins  method  is  likely  to  be  efficient  on  problems  in 
which  relatively  many  constraints  are  active  at  the  solution.  If  the  initial  working  set  of  hounds 
and  linear  constraints  is  large,  curvature  information  can  be  acninmlnted  in  It,  without  the  risk 
of  deleting  too  many  constraints  during,  a  single  major  iteration.  Under  these  circumstances,  the 
matrix  Zj  HZ  ,  may  be  updated  with  a  poMtivc -delinite  quasi-Newton  approximation  when  the 
lull  II  cannot. 


8.  Representation  of  the  Hessian  in  sparse  quadratic  programs 

All  the  eomplexit  ies  of  iinplenieiitat  ion  incut  ioiied  earlier  are  magnified  when  developing  met  hods 
for  large-scale  optimization  To  indicate  the  shift  in  perspect ive,  we  consider  an  SQP  method 
for  solving  nonliiiearly  constrained  problems  in  which  the  Hessian  of  the  Lagrangian  is  sparse 
and  known  exactly.  'Pirns  in  each  QP  siibprohlem.  II  is  available  in  some  form  that  allows  the 
quadratic  objective  and  its  gradient  i/  (  ll]i  1  </)  to  be  computed. 

ffeeall  that,  in  the  QP  phase,  tin-  search  direction  is  taken  as  Zj>/.  where//*  satisfies 


ZrHZp/  =  -ZTq.  (15) 

The  first  issue  in  solving  (15)  is  the  representation  of  Z.  Although  methods  for  computing 
an  orthogonal  basis  Z  for  the  null  space  of  a  sparse  matrix  have  been  the  subject  of  much 
recent  research  (see.  e  g..  Coleman  and  I’otben.  198-1).  they  are  not  practical  for  sparse  quadratic 
programs  because  of  the  need  to  update  Z  at  every  iteration  of  the  QP  method.  Instead,  methods 
have  been  developed  in  which  the  matrix  Z  is  not  stored  explicitly. 


Assume  that  the  nFR  columns  of  the  working  set  C  are  ordered  so  that 

C=(B  S), 

where  the  rn  x  m  matrix  B  is  non-singular.  (In  practice,  the  columns  of  B  may  occur  anywhere 
in  C.)  The  matrix  Z  defined  by 


provides  a  basis  for  the  null  space  of  C.  and  is  called  the  reduced- gradicn t  form  of  the  null 
space  (see,  e.g.,  Murtagh  and  Saunders,  1978).  This  form  is  effective  for  sparse  problems  because 
operations  with  Z  and  ZT  may  lx-  performed  using  a  factorization  of  the  sparse  matrix  B.  Let  a 
denote  nKn  —  rn  (the  number  of  columns  of  S).  The  reduced- gradient  form  of  Z  can  been  used 
with  great  success  in  quasi-Newton  methods  for  sparse  linearly  constrained  problems  in  which  a 
is  small,  by  maintaining  a  dense  Oholesky  factorization  of  a  quasi-Newton  approximation  to  the 
projected  Hessian  (sec  Murtagh  and  Saunders,  1978). 

In  the  case  of  sparse  quadratic  programming,  a  factorization  of  the  projected  Hessian  matrix 
ZTIIZ  is  needed  to  solve  (15).  In  a  general  dense  QP  method,  the  initial  projected  Hessian  would 
simply  be  formed  at  the  start  of  the  QP  phase  by  multiplying  the  explicit  matrices  Z  and  H.  One 
might,  suppose  that  this  procedure  could  carry  over  to  the  sparse  case,  assuming  that  a  is  small 
enough  so  that,  the  explicit  projected  Hessian  ran  be  stored.  (Note  that  the  projected  Hessian 
will  generally  be  dense,  even  if  H  and  B  are  sparse.) 

Unfortunately,  even  if  a  is  small,  forming  the  explicit  initial  projected  Hessian  may  involve 
a  substantial  amount  of  work.  When  Z  is  defined  by  (1G),  computation  of  ZTHZ  requires  the 
solution  of  2s  systems  of  size  m  x  m.  For  this  reason,  if  the  number  of  iterations  in  the  QP  method 
is  small,  computation  of'the  first  search  direction  will  dominate  the  time  required  to  solve  the 
quadratic  program.  (This  situation  always  applies  during  the  last  few  major  iterations  of  an  SQP 
method.)  To  put  the  relative  costs  into  perspective,  note  that  computation  of  a  single  row  and 
column  of  ZTfI Z  requires  approximately  the  same  amount  of  work  as  a  single  iteration  of  the 
simplex  method  (i.e.,  two  linear  systems  of  order  rn).  If  ,s  =  100,  forming  the  initial  projected 
Hessian  would  require  the  equivalent,  of  100  iterations  of  the  simplex  method! 

An  alternative  approach  is  to  use  a  quasi-Newton  method  to  solve  the  quadratic  program. 
The  required  quasi-Newton  approximation  to  Z1  II Z  is  maintained  using  the  change  in  p  and 
q  between  successive  minor  iterates.  (//  is  needed  only  to  form  Up,  from  which  q  and  the  QP 
objective  function  an*  easily  obtained.) 

It  may  not  be  obvious  why  this  is  an  improvement..  If  the  exact  projected  Hessian  is  not  used, 
the  resulting  search  direction  is  no  longer  the  step  to  the  minimum  of  the  quadratic  function  in 
the  subspace  defined  by  Z.  In  effect,  the  cost  of  forming  the  initial  projected  Hessian  seems 
merely  to  have  been  spread  over  a  number  of  iterations,  since  at  least  a  iterations  should  be 
required  to  produce  the  "true"  projected  Hessian.  However,  in  the  SQP  context,  the  major  gain 
is  that  the  ('holesky  factor  of  the  projected  Hessian  at  the  solution  < one  quadratic  program 
can  he  used  to  initiate  the  solution  of  the  next.  This  approach  has  proved  to  ho  very  successful 
in  the  implementation  of  an  SQP  method  for  solving  large-scale  problems  arising  in  the  optimal 
distribution  of  electrical  power  (see  JJurchett,  llapp  and  Vierath,  1984). 

This  is  another  situation  in  which  changes  to  the  initialization  procedure  ultimately  produce 
a  method  which  is  quite  different  from  the  original.  During  early  iterations,  significantly  different 
working  sets  are  generated  by  the  quasi- Newton  quadratic  program  birause  the  working  set  is 
usually  altered  well  before  the  approximate  projected  Hessian  has  any  resemblance  to  the  exact 
projected  Hessian. 
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the  flow  of  information  is  in  one  direction  only.  However,  developments 
in  methods  and  software  are  intimately  related,  and  neither  is  complete  if 
considered  in  isolation.  In  this  paper,  we  illustrate  how  the  development 
of  numerical  software  has  Influenced  our  research  in  optimization  methods. 
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